Offload device #145768

ZuseZ4 · 2025-08-22T21:56:19Z

LLVM's offload functionality usually expects an extra dyn_ptr argument. We could avoid it,b ut likely gonna need it very soon in one of the follow-up PRs (e.g. to request shared memory). So we might as well already add it.

This PR adds a %dyn_ptr ptr to GPUKernel ABI functions, if the offload feature is enabled.

WIP

r? @ghost

ZuseZ4 · 2025-08-22T23:20:13Z

@oli-obk Probably stupid question, but where do set parameter names?
When lowering any GPUKernel Function (including amdgpu_kernel) I add one ptr argument, so for a one-ptr function I instead generate

; Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(argmem: write)
define amdgpu_kernel void @kernel_1(ptr noundef writeonly captures(none) %x, ptr noundef readnone captures(none) %0) unnamed_addr #0 {
start:
  %1 = addrspacecast ptr %x to ptr addrspace(1)
  store double 2.100000e+01, ptr addrspace(1) %1, align 8
  ret void
}

but want to generate
define amdgpu_kernel void @kernel_1(ptr noundef writeonly captures(none) %0, ptr noundef readnone captures(none) %x) (aka ignore the first argument). My PR (asdf commit) just copies the first arg and the last arg get's a generic name, so I assume we have a list of names on the rust side, which we match starting at the first arg. However, I couldn't find a relevant location where we're calling LLVM functions on the rust side to set those names. Do you have any ideas?

Edit: I decided I'll just do the full rewrite of the module on llvm-ir level instead of MIR, since that's what I know best (and seathlin mentioned MIR is probably too low eithr way). I'll add more details later

ZuseZ4 · 2025-08-27T05:31:56Z

I'll clean this up further later to minimze the amount of c++, but I lost a bit of patience with LLVM's C API, so I just did 100% of the work with the C++ API. With this and the previous (review ready) patch, Rust's amdgcn target runs on a GPU, without manual LLVM-IR rewriting. I'll port it back from C++ to Rust.
This should also run on Nvidia or Intel GPUs, I just didn't got to test it yet.

ZuseZ4 · 2025-09-01T07:38:29Z

compiler/rustc_llvm/llvm-wrapper/RustWrapper.cpp

@@ -170,6 +172,27 @@ extern "C" void LLVMRustPrintStatistics(RustStringRef OutBuf) {
  llvm::PrintStatistics(OS);
 }

+extern "C" void LLVMRustOffloadMapper(LLVMModuleRef M, LLVMValueRef OldFn, LLVMValueRef NewFn) {


I think this is too niche to be worth wrapping in Rust. We would need to introduce the ValueToValueMapTy, and handle the mapping of one value to another. Plus we'd need to expose the used CloneFunctionInto as well as the CloneFunctionChangeTypes.

rust-log-analyzer · 2025-09-01T07:41:53Z

The job tidy failed! Check out the build log: (web) (plain enhanced) (plain)

Click to see the possible cause of the failure (guessed by this bot)

All checks passed!
checking python file formatting
27 files already formatted
checking C++ file formatting
/checkout/compiler/rustc_llvm/llvm-wrapper/RustWrapper.cpp:175:75: error: code should be clang-formatted [-Wclang-format-violations]
extern "C" void LLVMRustOffloadMapper(LLVMModuleRef M, LLVMValueRef OldFn, LLVMValueRef NewFn) {
                                                                          ^

clang-format linting failed! Printing diff suggestions:
--- /checkout/compiler/rustc_llvm/llvm-wrapper/RustWrapper.cpp (actual)
+++ /checkout/compiler/rustc_llvm/llvm-wrapper/RustWrapper.cpp (formatted)
@@ -171,9 +171,10 @@
   auto OS = RawRustStringOstream(OutBuf);
   llvm::PrintStatistics(OS);
 }
 
-extern "C" void LLVMRustOffloadMapper(LLVMModuleRef M, LLVMValueRef OldFn, LLVMValueRef NewFn) {
+extern "C" void LLVMRustOffloadMapper(LLVMModuleRef M, LLVMValueRef OldFn,
+                                      LLVMValueRef NewFn) {
   llvm::Module *module = llvm::unwrap(M);
   llvm::Function *oldFn = llvm::unwrap<llvm::Function>(OldFn);
   llvm::Function *newFn = llvm::unwrap<llvm::Function>(NewFn);
 

tidy error: checks with external tool 'clang-format' failed
some tidy checks failed
Bootstrap failed while executing `test src/tools/tidy tidyselftest --extra-checks=py,cpp,js,spellcheck`
Command `/checkout/obj/build/x86_64-unknown-linux-gnu/stage1-tools-bin/rust-tidy /checkout /checkout/obj/build/x86_64-unknown-linux-gnu/stage0/bin/cargo /checkout/obj/build 4 /node/bin/npm --extra-checks=py,cpp,js,spellcheck` failed with exit code 1
Created at: src/bootstrap/src/core/build_steps/tool.rs:1548:23
Executed at: src/bootstrap/src/core/build_steps/test.rs:1296:29

Command has failed. Rerun with -v to see more details.
Build completed unsuccessfully in 0:01:16
  local time: Mon Sep  1 07:41:41 UTC 2025
  network time: Mon, 01 Sep 2025 07:41:41 GMT
##[error]Process completed with exit code 1.
Post job cleanup.

ZuseZ4 · 2025-09-01T07:56:58Z

compiler/rustc_codegen_llvm/src/back/write.rs

+    }
+
+    let consider_offload = config.offload.contains(&config::Offload::Enable);
+    if consider_offload && (cgcx.target_arch == "amdgpu" || cgcx.target_arch == "nvptx64") {


these should probably be combined to a target_is_gpu, similar to target_is_like_darwin and target_is_like_aix

ZuseZ4 · 2025-09-01T08:07:13Z

r? @oli-obk

This adds two more commits on top of the other pr which fixes the host code generation.
Together, these 4 commits are enough to compile a rust function and run it on a gpu, I left instructions in the dev guide for it: https://rustc-dev-guide.rust-lang.org/offload/usage.html

rustbot · 2025-09-01T08:07:17Z

oli-obk is not on the review rotation at the moment.
They may take a while to respond.

This comment has been minimized.

Sign in to view

ZuseZ4 added 3 commits August 31, 2025 15:17

fix host code

f70c6f4

model offload C++ structs through Rust structs

0f05703

fix device code generation

cdbbe9c

ZuseZ4 force-pushed the offload-device branch from 773389b to cdbbe9c Compare September 1, 2025 02:53

This comment has been minimized.

Sign in to view

upgrade offload dyn_ptr handling from C++ to mostly safe Rust

dd5af93

ZuseZ4 commented Sep 1, 2025

View reviewed changes

rustbot assigned oli-obk Sep 1, 2025

ZuseZ4 marked this pull request as ready for review September 1, 2025 08:07

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Sep 1, 2025

ZuseZ4 mentioned this pull request Jul 24, 2025

Tracking Issue for GPU-offload #131513

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Offload device #145768

Offload device #145768

ZuseZ4 commented Aug 22, 2025

Uh oh!

This comment has been minimized.

ZuseZ4 commented Aug 22, 2025 •

edited

Loading

Uh oh!

This comment has been minimized.

ZuseZ4 commented Aug 27, 2025

Uh oh!

This comment has been minimized.

ZuseZ4 Sep 1, 2025

Uh oh!

rust-log-analyzer commented Sep 1, 2025

Uh oh!

ZuseZ4 Sep 1, 2025

Uh oh!

ZuseZ4 commented Sep 1, 2025

Uh oh!

rustbot commented Sep 1, 2025

Uh oh!

Uh oh!

Offload device #145768

Are you sure you want to change the base?

Offload device #145768

Conversation

ZuseZ4 commented Aug 22, 2025

Uh oh!

This comment has been minimized.

ZuseZ4 commented Aug 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment has been minimized.

ZuseZ4 commented Aug 27, 2025

Uh oh!

This comment has been minimized.

ZuseZ4 Sep 1, 2025

Choose a reason for hiding this comment

Uh oh!

rust-log-analyzer commented Sep 1, 2025

Uh oh!

ZuseZ4 Sep 1, 2025

Choose a reason for hiding this comment

Uh oh!

ZuseZ4 commented Sep 1, 2025

Uh oh!

rustbot commented Sep 1, 2025

Uh oh!

Uh oh!

ZuseZ4 commented Aug 22, 2025 •

edited

Loading